Loading up the data for some preliminary analyses of the binary climb/no-climb categories.
Note that the indices are all caps, and the linear measurements have lowercase letters after the first.
dat <- read_sheet("https://docs.google.com/spreadsheets/d/1-eknhyZ1JNnXqhg2kViyzVntC8NGZvILQX-aQQb1Jvk/edit#gid=325036460", na = c("NA", "?", "")) %>%
select(!NOTES) %>%
# Recode Ordinal Rankings
mutate(Loc_Ord = case_when(Loc_mode_Ordinal == "G" ~ 1,
Loc_mode_Ordinal == "A" ~ 2,
Loc_mode_Ordinal == "Sc" ~ 3,
Loc_mode_Ordinal == "T" ~ 4,
Loc_mode_Ordinal == "Is" ~ 5,
Loc_mode_Ordinal == "Sf" ~ 5,
Loc_mode_Ordinal == "Ss" ~ 6,
TRUE ~ NA),
Loc_Ord = as.ordered(Loc_Ord),
Loc_bin = case_when(Loc_mode_Bindary == "Ground" ~ 0,
Loc_mode_Bindary == "Tree" ~ 1,
TRUE ~ NA
),
# Loc_bin = as.factor(Loc_bin),
Loc_mode_Categorical = as.factor(Loc_mode_Categorical),
log_Mass = log(Mass_grams)) %>%
relocate(Loc_bin, .after = Loc_mode_Bindary) %>%
relocate(Loc_Ord, .after = Loc_mode_Ordinal) %>%
relocate(log_Mass, .before = Skl) %>%
#Calculate Ratios
mutate(SI = Sh / Sl, # Scapular Index
HRI = Hsw / Hl, # Humeral Robustness Index
HPI = Hpw / Hl, # Humeral Proximal Index
HEB = Hdw / Hl, # Humeral Epicondyle Breadth
HHRI = Hhl / Hl, # Humeral Head Robustness Index
HHW = Hhw / Hhl, # Humeral Head Shape Index
DI = Hdcw / Hsw, # Deltopectoral Crest Index
OLI = Uol / Ul, # Olecranon Process Length Index
BI = Rl / Hl, # Brachial Index
IM = (Hl+Ul)/(Fl+Tl), # Intermembral Index
PRTI = Mcl/(Hl+Rl), # Palm Robustness Index
MRI = Mcw / Mcl, # Metacarpal Robustness
MANUS = Ppl / Mcl, # MANUS index
MANUS2 = (Ppl+Ipl)/Mcl, # MANUS index with intermed. phalanx
IRI = Fgh / Fl, # Gluteal Index
FRI = Fsw / Fl, # Femoral Robustness
FEB = Fdw / Fl, # Femoral Epicondyle Breadth
CI = Tl / Fl, # Crural Index
TRI = Tmw / Tl, # Tibial Robustness Index
#ANR = Anl / Al, # Astragular Neck Robustness Index
#CAR = Cal / Cl, # Calcaneal Robustness Index
IRI = Il / Pel, # Illium Robustness Index
PR = Il / Isl, # Pelvic Index
PES = Pppl / Mtl, # PES INdex
PES2 = (Pppl+Pipl)/Mtl # PES with intermediate Phalanx
) %>%
mutate_at(vars(17:71), log) %>%
mutate_at(vars(16:93), scale2)
## ! Using an auto-discovered, cached token.
## To suppress this message, modify your code or options to clearly consent to
## the use of a cached token.
## See gargle's "Non-interactive auth" vignette for more details:
## <https://gargle.r-lib.org/articles/non-interactive-auth.html>
## ℹ The googlesheets4 package is using a cached token for 'jonnations@gmail.com'.
## ✔ Reading from "Master_Data".
## ✔ Range 'all data'.
What does the missing data look like?
You can scroll through the table below
n = nrow(dat)
dat %>% select(16:93) %>%
summarise_all((~ sum(is.na(.)))) %>%
mutate_if(is.double, ~ n - .) %>%
pivot_longer(everything(), names_to = "measurement", values_to = "count_missing") %>% arrange(desc(count_missing), measurement) %>%
mutate(percent_missing = round(count_missing / n, digits = 3)) %>%
kbl(caption = "Percentage of Missing Data") %>%
kable_classic(full_width = F, html_font = "Cambria") %>%
scroll_box(width = "500px", height = "200px")
| measurement | count_missing | percent_missing |
|---|---|---|
| Pdpw | 324 | 0.773 |
| Pipw | 278 | 0.663 |
| Pdpl | 277 | 0.661 |
| Dpw | 263 | 0.628 |
| Jl | 249 | 0.594 |
| Anl | 239 | 0.570 |
| Atw | 239 | 0.570 |
| Cal | 239 | 0.570 |
| Ccw | 238 | 0.568 |
| Csw | 238 | 0.568 |
| Ctl | 238 | 0.568 |
| Ctw | 238 | 0.568 |
| Al | 237 | 0.566 |
| Pppw | 230 | 0.549 |
| Skl | 228 | 0.544 |
| Mtw | 223 | 0.532 |
| Ipw | 217 | 0.518 |
| Dpl | 216 | 0.516 |
| PES2 | 194 | 0.463 |
| Pipl | 193 | 0.461 |
| Fbdw | 189 | 0.451 |
| Fbmw | 187 | 0.446 |
| Fgh | 187 | 0.446 |
| Fhd | 187 | 0.446 |
| HHRI | 185 | 0.442 |
| HHW | 185 | 0.442 |
| Hhl | 185 | 0.442 |
| Hhw | 185 | 0.442 |
| Ppw | 158 | 0.377 |
| Cl | 154 | 0.368 |
| Fbpw | 153 | 0.365 |
| MRI | 152 | 0.363 |
| Mcw | 152 | 0.363 |
| DI | 144 | 0.344 |
| Hdcw | 144 | 0.344 |
| Tdw | 141 | 0.337 |
| Tpw | 138 | 0.329 |
| Fbl | 137 | 0.327 |
| IRI | 137 | 0.327 |
| Isl | 137 | 0.327 |
| PR | 137 | 0.327 |
| Pel | 137 | 0.327 |
| Il | 136 | 0.325 |
| HPI | 135 | 0.322 |
| Hpw | 135 | 0.322 |
| Ipl | 117 | 0.279 |
| MANUS2 | 117 | 0.279 |
| PES | 95 | 0.227 |
| Pppl | 94 | 0.224 |
| Mtl | 88 | 0.210 |
| MANUS | 23 | 0.055 |
| Ppl | 23 | 0.055 |
| Mcl | 17 | 0.041 |
| PRTI | 17 | 0.041 |
| log_Mass | 10 | 0.024 |
| TRI | 3 | 0.007 |
| Tmw | 3 | 0.007 |
| CI | 2 | 0.005 |
| FEB | 2 | 0.005 |
| Fdw | 2 | 0.005 |
| IM | 2 | 0.005 |
| Tl | 2 | 0.005 |
| FRI | 1 | 0.002 |
| Fl | 1 | 0.002 |
| Fsw | 1 | 0.002 |
| SI | 1 | 0.002 |
| Sh | 1 | 0.002 |
| Sl | 1 | 0.002 |
| BI | 0 | 0.000 |
| HEB | 0 | 0.000 |
| HRI | 0 | 0.000 |
| Hdw | 0 | 0.000 |
| Hl | 0 | 0.000 |
| Hsw | 0 | 0.000 |
| OLI | 0 | 0.000 |
| Rl | 0 | 0.000 |
| Ul | 0 | 0.000 |
| Uol | 0 | 0.000 |
Preliminary data analysis, looping over all of the variables to see which ones do a good job predicting the binary tree vs. no-tree categorization
These are very preliminary data, and the results will become more
Here are all the model plots. On the y axis, 1 is TREE, 0 is NO TREE. The x axis is the phenotype, mean centered on 0 and scaled to a standard deviation of 1. All of the linear measurements are log transformed prior to mean-centering. All the models include log_mass as a variable, meaning that they are “size corrected” representations of the effect of the phenotype on climbing. What we are looking for is a slope that ranges across the whole y axis, meaning that it touches the 1 and 0, and has a steep slope (either up or down). To interpret, look at the 3rd plot, Hl. As Hl increases, the probability of being TREE increases.
Here are some standouts: (remember, these are log-transformed and size-corrected effect sizes)
for(i in fit_list2){
plot <- plot(conditional_effects(i), plot=F, points = T)[[1]]
print(plot)
}